approximate message
Decoupled Descent: Exact Test Error Tracking Via Approximate Message Passing
In modern parametric model training, full-batch gradient descent (and its variants) suffers due to progressively stronger biasing towards the exact realization of training data; this drives the systematic ``generalization gap'', where the train error becomes an unreliable proxy for test error. Existing approaches either argue this gap is benign through complex analysis or sacrifice data to a validation set. In contrast, we introduce decoupled descent (DD), a novel theory-based training algorithm that satisfies a train-test identity -- enforcing the train error to asymptotically track the test error for stylized Gaussian mixture models. Within this specific regime, leveraging approximate message passing theory, DD iteratively cancels the biases due to data reuse, rigorously demonstrating the feasibility of zero-cost validation and $100\%$ data utilization. Moreover, DD is governed by a low-dimensional state evolution recursion, rendering the dynamics of the algorithm transparent and tractable. We validate DD on XOR classification, yielding superior performance compared to GD; additionally, we implement noisy MNIST and non-linear probing of CIFAR-10, demonstrating that even when our stylized assumptions are relaxed, DD narrows the generalization gap compared to GD.
Multi-layer State Evolution Under Random Convolutional Design
Signal recovery under generative neural network priors has emerged as a promising direction in statistical inference and computational imaging. Theoretical analysis of reconstruction algorithms under generative priors is, however, challenging. For generative priors with fully connected layers and Gaussian i.i.d.
All-or-nothingstatisticalandcomputationalphase transitionsinsparsespikedmatrixestimation
Similarly the ISOMAP face database consists ofimages (256levels ofgray)ofsize64 64,i.e.,vectors in R4096, whereas the correct intrinsic dimension is only3 (for the vertical, horizontal pause and lightingdirection). The second approach, is anaverage caseapproach (in the spirit of thestatistical mechanics treatment ofhighdimensional systems), thatmodelsfeaturevectorsby arandom ensemble,taken as aset ofrandom vectors with independently identically distributed (i.i.d.) components, and a small but xed fraction of non-zero components.
Scaling Laws and Spectra of Shallow Neural Networks in the Feature Learning Regime
Defilippis, Leonardo, Xu, Yizhou, Girardin, Julius, Troiani, Emanuele, Erba, Vittorio, Zdeborová, Lenka, Loureiro, Bruno, Krzakala, Florent
Neural scaling laws underlie many of the recent advances in deep learning, yet their theoretical understanding remains largely confined to linear models. In this work, we present a systematic analysis of scaling laws for quadratic and diagonal neural networks in the feature learning regime. Leveraging connections with matrix compressed sensing and LASSO, we derive a detailed phase diagram for the scaling exponents of the excess risk as a function of sample complexity and weight decay. This analysis uncovers crossovers between distinct scaling regimes and plateau behaviors, mirroring phenomena widely reported in the empirical neural scaling literature. Furthermore, we establish a precise link between these regimes and the spectral properties of the trained network weights, which we characterize in detail. As a consequence, we provide a theoretical validation of recent empirical observations connecting the emergence of power-law tails in the weight spectrum with network generalization performance, yielding an interpretation from first principles.